An improved framework of GPU computing for CFD applications on structured grids using OpenACC

نویسندگان

چکیده

This paper is focused on improving multi-GPU performance of a research CFD code structured grids. MPI and OpenACC directives are used to scale the up 16 GPUs. shows that using P100 GPUs V100 can be 30$\times$ 70$\times$ faster than Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series issues related scaling multi-block addressed by applying various optimizations. Performance optimizations such as pack/unpack message method, removing temporary arrays arguments procedure calls, allocating global memory limiters connected boundary data, reordering non-blocking I\_send/I\_recv Wait reducing unnecessary implicit derived type member data movement between host device use GPUDirect improve compute utilization, throughput, asynchronous progression in modern programming features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Code Transformation Framework for Scientific Applications on Structured Grids

The combination of expert-tuned code expression and aggressive compiler optimizations is known to deliver the best achievable performance for modern multicore processors. The development and maintenance of these optimized code expressions is never trivial. Tedious and error-prone processes greatly decrease the code developer’s willingness to adopt manually-tuned optimizations. In this paper, we...

متن کامل

An improved FastSLAM framework using soft computing

FastSLAM is a framework for simultaneous localization and mapping (SLAM) using a Rao-Blackwellized particle filter. However, FastSLAM degenerates over time. This degeneracy is due to the fact that a particle set estimating the pose of the robot loses its diversity. One of the main reasons for losing particle diversity in FastSLAM is sample impoverishment. In this case, most of the particle weig...

متن کامل

An Improved Image Segmentation Algorithm Based on GPU Parallel Computing

In the process of image segmentation, the classic Fuzzy C-Means (FCM) algorithm is time-consuming and depends heavily on initialization center. Based on Graphic Processing Unit (GPU), this paper proposes a novel FCM algorithm by improving the computational formulas of membership degree and the update criterion of cluster centers. Our algorithm can initialize cluster centers purposefully and fur...

متن کامل

An Effective Task Scheduling Framework for Cloud Computing using NSGA-II

Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...

متن کامل

PACC: An Extension of OpenACC for Pipelined Processing of Large Data on a GPU

We present a suite of directives, named pipelined accelerator (PACC), and its implementation for accelerating large-scale computation on a graphics processing unit (GPU). PACC extends OpenACC to achieve division of large data that cannot be entirely stored in device memory. Given a program with PACC directives, our PACC translator rewrites the program into an OpenACC program such that data is d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Parallel and Distributed Computing

سال: 2021

ISSN: ['1096-0848', '0743-7315']

DOI: https://doi.org/10.1016/j.jpdc.2021.05.010